Overview

This notebook was developed to accompany the tutorial of a short course offered at the 2017 Annual Meeting of the American Political Science Association. The instructors for the course are Karsten Donnay (University of Konstanz), Eric Dunford (University of Maryland), Andrew Linke (University of Utah), Erin McGrath (University of Maryland), David Backer (University of Maryland), and David Cunningham (University of Maryland). This short course focuses on newly developed software tools designed by the instructors, which enable more effective work with multiple datasets that have geospatial properties, which are increasingly employed in research conducted throughout the social sciences. The aims of the course are to familiarize participants with the use of these tools and associated best practices. At the end of the course, participants should understand why and how they could use these tools to support relevant research that requires integrating datasets with particular geospatial properties.

The first part of the notebook walks through the functionality, applications and best practices of the geomerge package, which was just released. This package has been designed primarily to facilitate addressing challenges related to the integration of datasets with different geospatial properties. The package is illustrated using example data for Nigeria 2011. The illustration covers integration of Polygon, Raster and Point data, including how to generate spatial panel data.

The first part of the notebook walks through the functionality, applications and best practices of the meltt package, which was released earlier this year. This package has been designed to facilitate the integration of event data from multiple sources with differing properties. The package is illustrated by drawing on conflict event data from four prominent event datasets covering conflict observed in Nigeria during 2011.

The tutorial is designed to be hands-on, with participants working through the illustrative examples, accessing and processing datasets using the commands available in the geomerge and meltt packages. Doing so requires, at a minimum, an installation of the R programming software. Some knowledge of R is useful, though not mandatory. During this short course and tutorial, participants should learn about the utility, logic, and functionality of the two packages even without any significant expertise in R.

Part 1: geomerge - Geospatial Data Integration

Why we developed geomerge

The use case: In practice, research involving spatial data typically entails drawing on multiple sources that provide information on distinct variables, each with a particular geographical resolution. Conducting analysis requires integrating these variables from the separate datasets into a common data frame, with a geographical resolution that is appropriately comparable across all the variables.

The main challenges: Separate datasets can have very different spatial data formats. For example, information on population or elevation is most often available as Raster data. Information on a country’s administrative subdivisions is typically provided as Polygon data. The locations of conflict events or incidents of crime are usually coded as Point data. In essence, these data formats correspond to different units of observation. Different units implies a spatial mismatch. When spatial data are mismatched, they may not be usable for particular types of analysis (unless purposely considering variables at different units of observation). Separate datasets may also treat the same variable as being of different types (e.g., numeric vs. categorical).

The technical challenges: A whole range of packages in R provide excellent functionality for dealing with these data integration problems, without a single, simple framework that combines all this functionality. In addition, integrating different kinds of spatial data requires making assumptions and providing specifications for how to proceed with the integration.

What geomerge Does

The geomerge package provides this framework. The package allows for the automatic, flexible, transparent, reproducible integration of the most common types of spatial data. The integration can produce variables with the same spatial resolution, or merely establish the spatial correspondence of variables with different resolutions. In doing so, the package implements a number of established best practices that ensure robust results for many standard cases, while allowing for customization through optional parameters.

Using geomerge in Research

geomerge supports empirical research using spatial data in several important ways. First, the package streamlines the process of integrating data from multiple sources. Second, the package offers the flexibility of enabling users to generate variants of the same data. Each of these variants can reflect different assumptions about how to perform the integration, including in reference to the choice of spatial unit, as well as the choice of assignment, zonal function or point aggregation rules. Third, the variants can be used to test the robustness of analyses to assumptions about data integration. Fourth, the package contributes to transparency and simplifies replication by providing clear, standardized interfaces that document the assumptions users made when integrating data. The data and code used in performing any integration can be supplied to accompany similar code used in performing analysis.

Installation

The package can be installed through the CRAN repository.

#install.packages("geomerge")

We recommend to install the latest development version of the package from Github for the purposes of this tutorial. To download this version, you may first need to install the devtools package.

# install.packages("devtools")
devtools::install_github("css-konstanz/geomerge")
library(geomerge)

Before we get started, please set your work directory to the directory into which you unpacked the tutorial files (including the “data” directory).

setwd("YOUR DIRECTORY") 

Example Data: Nigeria

In this tutorial, we use a number of different data layers for Nigeria 2011 that constitute the example data distributed with the geomerge package. The data can be easily loaded using

data(geomerge)

The example datasets cover all three main spatial data types discussed above:

  • ACLED (Point data): Conflict events for Nigeria in 2011 as recorded by the Armed Conflict Location & Event Data project, available from http://www.acleddata.com/data. This dataset contains geocoded, timestamped information on individual conflict events.

  • AidData (Point data, including locations geocoded to administrative divisions, but assigned coordinates of centroids): Activities of development aid projects in Nigeria with start dates in 2011 as recorded by AidData, available at http://aiddata.org. This dataset contains geocoded, timestamped information on individual aid projects.

Note: Both Point datasets are time-stamped, which means that they can be used for dynamic (i.e., spanning a spatial panel) as well as static (i.e., cross-sectional) integration.

  • geoEPR (Polygons data): All politically relevant ethnic groups for Nigeria in 2011, as recorded in the EPR-Core 2014 dataset, available at https://icr.ethz.ch/data/epr/geoepr/. This dataset assigns every politically relevant ethnic group one of six settlement patterns and provides polygons describing their location.

  • gpw (Raster data): Population at a gridded resolution of about 4km for Nigeria in 2010, as compiled by CIESIN, available at http://sedac.ciesin.columbia.edu/data/collection/gpw-v4. This dataset provides population estimates at several grid resolutions.

  • states (Polygons data): Second-order administrative divisions (ADM2s) for Nigeria, known as Local Government Areas (equivalent of US states). The dataset is available at http://www.arcgis.com/home/item.html?id=0e58995046b74254911c1dc0eb756fa4. This dataset is used in the illustration for the target SpatialPolygonsDataFrame to which spatial data are merged. The polygons in states have been simplified to reduce the size of the SpatialPolygonsDataFrame and enable fast execution of the examples provided.

To familiarize yourself with these datasets, we recommend to take a closer look at them. To see a handful of sample values for each:

library(raster)
## Loading required package: sp
## Warning: package 'sp' was built under R version 3.4.3
# Quick overview plot
plot(states)

# Show top rows of dataset
head(states@data)
##   ID  NAME_0    NAME_1
## 0  1 Nigeria      Abia
## 1  2 Nigeria   Adamawa
## 2  3 Nigeria Akwa Ibom
## 3  4 Nigeria   Anambra
## 4  5 Nigeria    Bauchi
## 5  6 Nigeria   Bayelsa
# Quick overview plot
plot(states)
plot(ACLED, new=TRUE,add=TRUE)

# Show top rows of dataset
head(ACLED@data)
##   GWNO EVENT_ID_C EVENT_ID_N  timestamp YEAR TIME_PRECI
## 1  475    2962NIG      67219 2011-01-01 2011          1
## 2  475    2963NIG      67220 2011-01-03 2011          1
## 3  475    2964NIG      67221 2011-01-03 2011          1
## 4  475    2965NIG      67222 2011-01-04 2011          1
## 5  475    2966NIG      67223 2011-01-04 2011          1
## 6  475    2967NIG      67224 2011-01-04 2011          1
##                      EVENT_TYPE
## 1         Strategic development
## 2    Violence against civilians
## 3    Violence against civilians
## 4         Strategic development
## 5                Riots/Protests
## 6 Battle-No change of territory
##                                                       ACTOR1
## 1 Boko Haram - Jama'atu Ahli is-Sunnah lid-Dawatai wal-Jihad
## 2 Boko Haram - Jama'atu Ahli is-Sunnah lid-Dawatai wal-Jihad
## 3                         Unidentified Armed Group (Nigeria)
## 4                              DDM: Delta Democratic Militia
## 5                                          Rioters (Nigeria)
## 6                              PDP: Peoples Democratic Party
##                                                   ALLY_ACTOR INTER1
## 1                                                       <NA>      3
## 2                                                       <NA>      3
## 3                                                       <NA>      3
## 4                                                       <NA>      3
## 5 Boko Haram - Jama'atu Ahli is-Sunnah lid-Dawatai wal-Jihad      5
## 6                                                       <NA>      3
##                             ACTOR2                           ALLY_ACT_1
## 1                             <NA>                                 <NA>
## 2              Civilians (Nigeria) Police Forces of Nigeria (1999-2015)
## 3              Civilians (Nigeria)                                 <NA>
## 4                             <NA>                                 <NA>
## 5                             <NA>                                 <NA>
## 6 RPN: Republican Party of Nigeria                                 <NA>
##   INTER2 INTERACTIO COUNTRY  ADMIN1          ADMIN2 ADMIN3  LOCATION
## 1      0         30 Nigeria Plateau       Jos North   <NA>       Jos
## 2      7         37 Nigeria   Borno       Maiduguri   <NA> Maiduguri
## 3      7         37 Nigeria   Borno       Maiduguri   <NA> Maiduguri
## 4      0         30 Nigeria   Delta   Ughelli North   <NA>   Ughelli
## 5      0         50 Nigeria Adamawa           Gombi   <NA>    Jimeta
## 6      3         33 Nigeria     Oyo Ogbomosho North   <NA>    Orogun
##   LATITUDE LONGITUDE GEO_PRECIS                SOURCE
## 1  9.92849   8.89212          1  Agence France Presse
## 2 11.84644  13.16027          1  Agence France Presse
## 3 11.84644  13.16027          1  Agence France Presse
## 4  5.48986   6.00743          1 BBC Monitoring Africa
## 5  9.28333  12.46667          1 BBC Monitoring Africa
## 6  8.15000   4.28330          1            ChannelsTV
##                                                                                                                                                                                                                                                            NOTES
## 1 Suspected Boko Haram arsonists burnt a church in a northern Nigerian city. Arsonists Saturday night who set a fire on the church that gutted a section of it before the fire was put out by residents. No one was hurt in the attack as there were no worshipp
## 2        Suspected members of a radical Islamist sect blamed for a spate of recent attacks in northern Nigeria shot dead an off-duty policeman in Maiduguri. The victim was wearing civilian clothes and was about to enter his home when the attack took place.
## 3                                        Gunmen killed three people at a movie theatre in a northern city in an attack police believe is politically-motivated ahead of general elections. The assailants were believed to be thugs loyal to a local politician.
## 4 A little-known group calling itself the Delta Democratic Militia claimed responsibility for an arson attack which razed the INEC offices in Delta state to the ground. The group claimed the attack was a warning against electoral malpractice in the upcomin
## 5                                                                                         A riot broke out at Jimeta Prison complex when suspected Boko Haram inmates attempted a prison break by overpowering guards. The attempted break-out was unsuccessful.
## 6                                                                                                                                         Three people were killed in Orogun after a clash between supporters of two governorship candidates of the RPN and PDP.
##   FATALITIES
## 1          0
## 2          1
## 3          3
## 4          0
## 5          0
## 6          3
# Quick overview plot
plot(states)
plot(AidData, new=TRUE,add=TRUE)

# Show top rows of dataset
head(AidData@data)
##     project_id geoname_id precision_  place_name latitude longitude
## 120  104763105    2332453          4       Lagos  6.53774   3.35220
## 126  104895556    2328926          6     Nigeria 10.00000   8.00000
## 141  104924416    2352778          1       Abuja  9.05785   7.49508
## 142  104924722    2332453          4       Lagos  6.53774   3.35220
## 143  104924761    2328926          6     Nigeria 10.00000   8.00000
## 144  104924802    2328927          2 Niger Delta  4.83333   6.00000
##     location_t                            geoname_ad
## 120       ADM1                 6295630|6255146|NG|05
## 126       PCLI                    6295630|6255146|NG
## 141       PPLC 6295630|6255146|NG|11|8635054|2352778
## 142       ADM1                 6295630|6255146|NG|05
## 143       PCLI                    6295630|6255146|NG
## 144       DLTA                 6295630|6255146|NG|00
##                                                                      geoname__1
## 120                                                  Earth|Africa|Nigeria|Lagos
## 126                                                        Earth|Africa|Nigeria
## 141 Earth|Africa|Nigeria|Federal Capital Territory|Municipal Area Council|Abuja
## 142                                                  Earth|Africa|Nigeria|Lagos
## 143                                                        Earth|Africa|Nigeria
## 144                                            Earth|Africa|Nigeria|Niger Delta
##     aiddata_id aiddata_2_ year     donor donor_iso donor_regi
## 120  104763105       <NA> 2011    Norway        NO     Europe
## 126  104895556       <NA> 2011 Australia        AU    Oceania
## 141  104924416       <NA> 2011    Norway        NO     Europe
## 142  104924722       <NA> 2011    Norway        NO     Europe
## 143  104924761       <NA> 2011    Norway        NO     Europe
## 144  104924802       <NA> 2011    Norway        NO     Europe
##            implementi financing_ crs_bi_mul recipient recipient_
## 120  Carbon Limits AS      NORAD          1   Nigeria         NG
## 126     Public Sector     AusAID          1   Nigeria         NG
## 141 Jose Manuel Ramos        MFA          1   Nigeria         NG
## 142  INCAS Consulting        MFA          1   Nigeria         NG
## 143  INCAS Consulting        MFA          1   Nigeria         NG
## 144  INCAS Consulting        MFA          1   Nigeria         NG
##                  recipient1  timestamp   end_date commitment planned_st
## 120 Africa, South of Sahara 2011-10-10 31/12/2012 2011/01/01       <NA>
## 126 Africa, South of Sahara 2011-07-01  30/6/2018 2011/01/01       <NA>
## 141 Africa, South of Sahara 2011-09-29 31/12/2012 2011/01/01       <NA>
## 142 Africa, South of Sahara 2011-10-31 31/12/2012 2011/01/01       <NA>
## 143 Africa, South of Sahara 2011-04-07 31/12/2011 2011/01/01       <NA>
## 144 Africa, South of Sahara 2011-03-31 31/12/2011 2011/01/01       <NA>
##     planned_en
## 120       <NA>
## 126       <NA>
## 141       <NA>
## 142       <NA>
## 143       <NA>
## 144       <NA>
##                                                                   title
## 120             Lagos State Gov - CDM development - sawdust utilization
## 126                                      ADS Intake 2012 - Consolidated
## 141                                          JPO Ingrid Midtgaard UNODC
## 142                                                  Cultural week 2012
## 143                           Fridtjov Nansen Nigeria-Sao Tome JDZ 2011
## 144 Integrated Stabilisation Framework, Niger Delta. Konflikthindtering
##                                                              short_desc
## 120             LAGOS STATE GOV - CDM DEVELOPMENT - SAWDUST UTILIZATION
## 126                                      ADS INTAKE 2012 - CONSOLIDATED
## 141                                          JPO INGRID MIDTGAARD UNODC
## 142                                                  CULTURAL WEEK 2012
## 143                           FRIDTJOV NANSEN NIGERIA-SAO TOME JDZ 2011
## 144 INTEGRATED STABILISATION FRAMEWORK, NIGER DELTA. KONFLIKTHINDTERING
##                                                                                                                                                                                                                                                        long_descr
## 120                                                                                     CDM project development - The project focuses on utilization of the biomass waste in the sawmill community of Okobaba for a biomass fuel, thereby reducing CO2-emissions.
## 126                                                                                                                                                                          In-Australia costs for all Australian Development Awards long and short term courses
## 141                                                                                                                                                                           JPO Ingrid Midtgaard UNODC. Duty station: Abuja, Nigeria. Sector: Criminal Justice.
## 142                                                                                                                                                                                                      Cultural and business week in Lagos 22 -25 February 2012
## 143 Nigeria-Sao Tome & Principe Joint Development Authority (JDA) has asked the Norwegian Government to use the research vessel Dr. Fridtjov Nansen to investigate the marine resources, oceanography and environmental monitoring in connection with their newly
## 144                                                                                                                                               Integrated Stabilisation Framework for the Niger Delta Expert Working Group Rapport Nigeria. Konflikthindtering
##      donor_proj donor_seco aiddata_se
## 120 NGA-12/0001 2011001606        230
## 126      11A758 2011000917        160
## 141 NGA-11/0005 2011001604        151
## 142 NGA-12/0002 2011001607        160
## 143 NGA-10/0012 2011001601        313
## 144 NGA-11/0002 2011001602        152
##                                                 aiddata__1 aiddata_pu
## 120                           Energy generation and supply      23030
## 126               Other social infrastructure and services      16010
## 141                  Government and civil society, general          0
## 142               Other social infrastructure and services          0
## 143                                                Fishing          0
## 144 Conflict prevention and resolution, peace and security          0
##                             aiddata__2        aiddata_ac
## 120 Power generation/renewable sources          23030.07
## 126           Social/ welfare services 16010.07|91010.01
## 141                               <NA>              <NA>
## 142                               <NA>              <NA>
## 143                               <NA>              <NA>
## 144                               <NA>              <NA>
##                                                                                           aiddata__3
## 120                                                                                          Biomass
## 126 Culture and recreation|All items relating to otherwise unspecified adminstrative costs of donors
## 141                                                                                             <NA>
## 142                                                                                             <NA>
## 143                                                                                             <NA>
## 144                                                                                             <NA>
##      flow_name crs_sector                                  crs_sect_1
## 120 ODA Grants        230                                II.3. Energy
## 126 ODA Grants        430                     IV.2. Other Multisector
## 141 ODA Grants        151   I.5.a. Government & Civil Society-general
## 142 ODA Grants        160 I.6. Other Social Infrastructure & Services
## 143 ODA Grants        313                            III.1.c. Fishing
## 144 ODA Grants        152           I.5.b. Conflict, Peace & Security
##     crs_purpos                                                  crs_purp_1
## 120      23070                                                     Biomass
## 126      43081                              Multisector education/training
## 141      15113              Anti-corruption organisations and institutions
## 142      16061                                      Culture and recreation
## 143      31320                                         Fishery development
## 144      15220 Civilian peace-building, conflict prevention and resolution
##     coalesced_                                                  coalesced1
## 120      23030                          Power generation/renewable sources
## 126      16010                                    Social/ welfare services
## 141      15120                          Public sector financial management
## 142      16010                                    Social/ welfare services
## 143      31320                                         Fishery development
## 144      15220 Civilian peace-building, conflict prevention and resolution
##     commitme_1 total_proj crs_trade crs_climat crs_biodiv crs_gender
## 120      44071          0         0          1          0          0
## 126      29217          0         0          0          0          1
## 141      53527          0         0          0          0          0
## 142      17842          0         0          0          0          0
## 143     713699          0         0          0          0          0
## 144     592371          0         0          0          0          0
##     crs_enviro crs_desert pdgg channel_co finance_t associated future_ds_
## 120          0       <NA>    0      52000       C01       <NA>          0
## 126          0       <NA> <NA>      51000       E01       <NA>          0
## 141          0       <NA>    2      41128       D01       <NA>          0
## 142          0       <NA>    0      52000       G01       <NA>          0
## 143          2       <NA>    0      51000       C01       <NA>          0
## 144          0       <NA>    2      52000       C01       <NA>          0
##     future_ds1 received_a irtc_amoun untied_amo tied_amoun partial_ti
## 120          0          0          0      44071          0          0
## 126          0          0          0      29217          0          0
## 141          0          0          0      53527          0          0
## 142          0          0          0          0          0          0
## 143          0          0          0     713699          0          0
## 144          0          0          0     592371          0          0
##     finance_t2 arrears_in arrears_pr initial_re  ftc repay_type outstandin
## 120        C01          0          0          1 <NA>       <NA>          0
## 126        E01          0          0          8    1       <NA>          0
## 141        D01          0          0          1    1       <NA>          0
## 142        G01          0          0          1 <NA>       <NA>          0
## 143        C01          0          0          1 <NA>       <NA>          0
## 144        C01          0          0          1 <NA>       <NA>          0
##     interest_a expert_com export_cre expert_ext additional source
## 120          0          0          0          0       <NA>   OECD
## 126          0          0          0          0       <NA>   OECD
## 141          0          0          0          0       <NA>   OECD
## 142          0          0          0          0       <NA>   OECD
## 143          0          0          0          0       <NA>   OECD
## 144          0          0          0          0       <NA>   OECD
##          source_det
## 120 CRS Online 2012
## 126 CRS Online 2012
## 141 CRS Online 2012
## 142 CRS Online 2012
## 143 CRS Online 2012
## 144 CRS Online 2012
# Quick overview plot
plot(geoEPR)

# Show top rows of dataset
head(geoEPR@data)
##                              EPRgroup
## 0 Hausa-Fulani and Muslim Middle Belt
## 1                              Yoruba
## 2                                Igbo
## 3                                 Tiv
## 4                                Ijaw
## 5                               Ogoni

Using geomerge

The main functionality of the geomerge package is provided by a single function with the same name. The output of the function is an object of class “geomerge”, which is a list with three slots: (1) data contains the spatial data resulting from integration, (2) inputData stores the input dataset, and (3) parameters logs all parameters with which geomerge was executed.

Running geomerge has two basic requirements.

The first requirement is input data, comprised of any number of objects of type SpatialPolygonsDataFrame, SpatialPointsDataFrame and RasterLayer. The RasterLayer will always by definition be single-valued. Therefore, geomerge requires the user to select one specific variable in each of the SpatialPolygonsDataFrame and SpatialPointsDataFrame objects prior to integration. SpatialPointsDataFrame may also contain a second column named timestamp, which can be used for dynamic integration.

The rationale is that the package uses the name of the input data to label the corresponding variables in the integrated data. This approach establishes a clear, unique link between the input and integrated data. If a user wishes to work with several variables from the same dataset, simply enter these variables as separate arguments (with unique names). We generally advise users to rely on meaningful names when labeling input data.

The second requirement, called target, specifies the spatial structure to which variables from all input objects are merged. The example in the geomerge package requires this target to be of class SpatialPolygonsDataFrame. In practice, the spatial structure can have any shape (e.g., polygons of administrative units, raster cells, etc.).

Note: The package provides a useful helper function called integrateGrid, which generates a grid of user-specified cell size for the spatial extent defined by a spatial R object.

geomerge assumes that all inputs of type SpatialPolygonsDataFrame and RasterLayer are static and contemporary. If the polygons or raster are changing, we advise users to rerun geomerge for each interval in which data are static and contemporary. The package allows for dynamic integration of all inputs that are a SpatialPointsDataFrame. For example, one can automatically generate the counts of events that occur within a specific unit of target within a specific time period.

geomerge has a number of other optional arguments, which we will explore further below. These optional arguments enable specific kinds of integration (i.e., dynamic vs. static) and/or allow the user to change assumptions about zonal functions, assignment rules, etc. from the default values.

Note: The print, summary and plot functions are overloaded for objects of class “geomerge”, meaning that these functions return specific outputs for objects of class “geomerge”.

Static Integration of Polygon and Raster Data

The simplest case is that of merging static layers. Consider, for example, the case that geo-spatial information about the settlement areas of ethnic groups ought to be merged with the administrative units of a country to determine which group is the dominant faction in each area. In the following examples, we therefore assume that the target of integration is the states SpatialPolygonsDataFrame.

Let’s begin by integrating one Polygon dataset with states.

output = geomerge(geoEPR,target=states)
##  geomerge: Geospatial data integration.
##  Karsten Donnay and Andrew Linke, 2017
## 
##  ATTENTION: Depending on the resolution and number of datasets, the merger may take some time!
## 
## 
##  geomerge(geoEPR, target = states)
## 
##  NOTE: The extent of input geoEPR is smaller than that of target. This might lead to NA values.
## 
##  Running geomerge in static mode.
##  Dataset1: geoEPR (SpatialPolygonsDataFrame)
##  Merging polygon data...
##  NOTE: No spatial lags calculated for geoEPR since data is non-numeric.
##  Done. 
##  Dataset geoEPR successfully merged to target.
##  Completed!
summary (output)
## geomerge completed: 1 datasets successfully integrated - run in static mode.
## 
## The following 1 non numerical variable(s) are available:
##  geoEPR
names(output$data)
## [1] "FID"    "ID"     "NAME_0" "NAME_1" "area"   "geoEPR"

Notice that the function returns a number of messages documenting the progress of the integration task. When merging more complex data, the function may run for some time and monitoring progress can therefore be relevant. If no printed progress updates are required, simply use the optional argument silent = TRUE.

output = geomerge(geoEPR,target=states,silent=TRUE)
summary (output)
## geomerge completed: 1 datasets successfully integrated - run in static mode.
## 
## The following 1 non numerical variable(s) are available:
##  geoEPR

Here, the default settings of geomerge make implicit assumptions regarding the assignment of the values in geoEPR to the target of states SpatialPolygonsDataFrame. The default assignment rule uses maximum area overlap (assignment = "max(area)"). This rule implies that a value is assigned to any spatial unit of target that corresponds to the unit in geoEPR with the largest spatial overlap.

As an alternative, geomerge supports assignment based on minimal area overlap (assignment = "min(area)").

Assignment can also be done by maximum population (assignment = "max(pop)") or minimum population (assignment = "min(pop)"), which operate similar to the area .

In addition, geomerge permits assignment weighted by area (assignment = "weighted(area)") or population (assignment = "weighted(pop)"). The former assigns the value that is the area-weighted average across all units intersecting with the spatial unit in target. The latter is analogous, but assigns the value based on the population represented by that area.

Naturally, all the options relying on population require a population raster input called population.data. Here is an example:

output = geomerge(geoEPR,target=states,
                  silent=TRUE,assignment="max(pop)",
                  population.data=gpw)
## 
##  Generating zonal statistics for population based assignment... Done.
summary (output)
## geomerge completed: 1 datasets successfully integrated - run in static mode.
## 
## The following 1 non numerical variable(s) are available:
##  geoEPR

Note: Any weighted assignment (whether area- or population-based) is only allowed for numeric data. Within our illustration, therefore, weighted assignment is not possible for the layer geoEPR.

The integration of Raster data is similarly straightforward.

Note: geomerge accepts any optional arguments of the function extract in the raster package. These arguments can be entered in the exact same syntax as in the original extract function and are passed on to any use of the function within the package. For example, in the illustration we use the optional input na.rm = TRUE because the gpw data has a few missing values that we want to ignore when performing the data integration.

output = geomerge(gpw,na.rm=TRUE,target=states)
##  geomerge: Geospatial data integration.
##  Karsten Donnay and Andrew Linke, 2017
## 
##  ATTENTION: Depending on the resolution and number of datasets, the merger may take some time!
## 
## 
##  geomerge(gpw, na.rm = TRUE, target = states)
## 
##  Running geomerge in static mode.
##  Dataset1: gpw (RasterLayer)
##  Generating zonal statistics... Done. 
##  Dataset gpw successfully merged to target.
##  Completed!
summary (output)
## geomerge completed: 1 datasets successfully integrated - run in static mode.
## 
## The following 1 numerical variable(s) are available:
##  gpw
plot(output)

As can be seen in the summary, the package not only merged the layer gpw to states, but also generated its value per area of the target polygon and first- and second-order spatial lag values for each. For inputs of type RasterLayer, values per area are always also returned. Whether or not spatial lags should be calculated can be controlled by the optional Boolean argument spat.lag.

output = geomerge(gpw,na.rm=TRUE,target=states,spat.lag=FALSE)
##  geomerge: Geospatial data integration.
##  Karsten Donnay and Andrew Linke, 2017
## 
##  ATTENTION: Depending on the resolution and number of datasets, the merger may take some time!
## 
## 
##  geomerge(gpw, na.rm = TRUE, target = states, spat.lag = FALSE)
## 
##  Running geomerge in static mode.
##  Dataset1: gpw (RasterLayer)
##  Generating zonal statistics... Done. 
##  Dataset gpw successfully merged to target.
##  Completed!
summary (output)
## geomerge completed: 1 datasets successfully integrated - run in static mode.
## 
## The following 1 numerical variable(s) are available:
##  gpw
plot(output)

As in the case of Polygon data, the defaults of geomerge have built-in implicit assumptions regarding zonal statistics. The default zonal function is summation (zonal.fun = sum). The package also supports all zonal statistics consistent with the extract function in the raster package.

output = geomerge(gpw,na.rm=TRUE,target=states,
                  spat.lag=FALSE,zonal.fun=min)
##  geomerge: Geospatial data integration.
##  Karsten Donnay and Andrew Linke, 2017
## 
##  ATTENTION: Depending on the resolution and number of datasets, the merger may take some time!
## 
## 
##  geomerge(gpw, na.rm = TRUE, target = states, spat.lag = FALSE, 
##     zonal.fun = min)
## 
##  Running geomerge in static mode.
##  Dataset1: gpw (RasterLayer)
##  Generating zonal statistics... Done. 
##  Dataset gpw successfully merged to target.
##  Completed!
summary (output)
## geomerge completed: 1 datasets successfully integrated - run in static mode.
## 
## The following 1 numerical variable(s) are available:
##  gpw
plot(output)

Static and Dynamic Integration of Point Data

In geomerge, integration of point data supports two different heuristics, which the user specifies via point.agg. The first heuristic (point.agg = "cnt") counts the occurrence of points in a given unit of target. The second heuristic users (point.agg = "sum") sums the values for all points in a given unit. This heuristic is only appropriate for numeric variables.

To illustrate, we use information on the conflict fatalities as recorded in ACLED and the financial commitments of development aid projects as recorded in AidData. We start by looking at the event counts and the number of projects in each Local Government Area of Nigeria throughout 2011 using point.agg = "cnt". Then we examine the total numbers of conflict fatalities and aid dollar commitments associated with those areas.

# First select the corresponding columns only
ACLED.fatalities = ACLED[,names(ACLED)=='FATALITIES']
AidData.commitment = AidData[,names(AidData)=='commitme_1']
# Run geomerge using point.agg = 'cnt
output = geomerge(ACLED.fatalities,AidData.commitment,target=states,point.agg='cnt')
##  geomerge: Geospatial data integration.
##  Karsten Donnay and Andrew Linke, 2017
## 
##  ATTENTION: Depending on the resolution and number of datasets, the merger may take some time!
## 
## 
##  geomerge(ACLED.fatalities, AidData.commitment, target = states, 
##     point.agg = "cnt")
## 
##  Running geomerge in static mode.
##  Dataset1: ACLED.fatalities (SpatialPointsDataFrame)
##  Aggregating point data... Done. 
##  Dataset ACLED.fatalities successfully merged to target.
##  Dataset2: AidData.commitment (SpatialPointsDataFrame)
##  Aggregating point data... Done. 
##  Dataset AidData.commitment successfully merged to target.
##  Completed!
summary(output)
## geomerge completed: 2 datasets successfully integrated - run in static mode.
## 
## The following 2 numerical variable(s) are available:
##  ACLED.fatalities, AidData.commitment
plot(output)

# Run geomerge using point.agg = 'sum
output = geomerge(ACLED.fatalities,AidData.commitment,target=states,point.agg='sum')
##  geomerge: Geospatial data integration.
##  Karsten Donnay and Andrew Linke, 2017
## 
##  ATTENTION: Depending on the resolution and number of datasets, the merger may take some time!
## 
## 
##  geomerge(ACLED.fatalities, AidData.commitment, target = states, 
##     point.agg = "sum")
## 
##  Running geomerge in static mode.
##  Dataset1: ACLED.fatalities (SpatialPointsDataFrame)
##  Aggregating point data... Done. 
##  Dataset ACLED.fatalities successfully merged to target.
##  Dataset2: AidData.commitment (SpatialPointsDataFrame)
##  Aggregating point data... Done. 
##  Dataset AidData.commitment successfully merged to target.
##  Completed!
summary(output)
## geomerge completed: 2 datasets successfully integrated - run in static mode.
## 
## The following 2 numerical variable(s) are available:
##  ACLED.fatalities, AidData.commitment
plot(output)

Dynamic integration of point data follows the same process as before, but separated in a series of temporal units, thereby generating a spatial panel. In geomerge, the temporal units are specified through the time argument. The package performs static integration if time = NA. For dynamic integration, the user must specify time = c(start_date, end_date, interval_length). All three inputs must be strings, where interval_length is defined in multiples of t_unit. The default value is t_unit = "days". The package also accepts inputs of “secs”, “mins”, “hours”, “months” or “years”.

In the following illustration, we employ the same data as before, but now include the “timestamp” column from both datasets. Information capturing the timing of observations is a prerequisite for dynamic integration. The information does not have to be at any specific level of precision, but does have to concern timing. We iterate through the whole year 2011 in one-month steps. In other words, we generate a county-month spatial panel.

# First select the corresponding columns only
ACLED.fatalities = ACLED[,names(ACLED)%in%c('timestamp','FATALITIES')]
AidData.commitment = AidData[,names(AidData)%in%c('timestamp','commitme_1')]
# Run geomerge using point.agg = 'cnt
output = geomerge(ACLED.fatalities,AidData.commitment,
                  target=states,time=c("2011-01-01","2011-12-31","1"),
                  t_unit='months',point.agg='cnt')
##  geomerge: Geospatial data integration.
##  Karsten Donnay and Andrew Linke, 2017
## 
##  ATTENTION: Depending on the resolution and number of datasets, the merger may take some time!
## 
## 
##  geomerge(ACLED.fatalities, AidData.commitment, target = states, 
##     time = c("2011-01-01", "2011-12-31", "1"), point.agg = "cnt", 
##     t_unit = "months")
## 
##  Running geomerge in dynamic mode.
##  Dataset1: ACLED.fatalities (SpatialPointsDataFrame)
##  Aggregating point data for period 1... Done.
##  Aggregating point data for period 2... Done.
##  Aggregating point data for period 3... Done.
##  Aggregating point data for period 4... Done.
##  Aggregating point data for period 5... Done.
##  Aggregating point data for period 6... Done.
##  Aggregating point data for period 7... Done.
##  Aggregating point data for period 8... Done.
##  Aggregating point data for period 9... Done.
##  Aggregating point data for period 10... Done.
##  Aggregating point data for period 11... Done.
##  Aggregating point data for period 12... Done. 
##  Dataset ACLED.fatalities successfully merged to target.
##  Dataset2: AidData.commitment (SpatialPointsDataFrame)
##  Aggregating point data for period 1... Done.
##  Aggregating point data for period 2... Done.
##  Aggregating point data for period 3... Done.
##  Aggregating point data for period 4... Done.
##  Aggregating point data for period 5... Done.
##  Aggregating point data for period 6... Done.
##  Aggregating point data for period 7... Done.
##  Aggregating point data for period 8... Done.
##  Aggregating point data for period 9... Done.
##  Aggregating point data for period 10... Done.
##  Aggregating point data for period 11... Done.
##  Aggregating point data for period 12... Done. 
##  Dataset AidData.commitment successfully merged to target.
##  Completed!
summary(output)
## geomerge completed: 2 datasets successfully integrated - run in dynamic mode, spatial panel was generated.
## 
## The following 2 numerical variable(s) are available:
##  ACLED.fatalities, AidData.commitment
## 
## First and second order temporal lag values available.
plot(output)
## Output data is spatial panel, showing results only for the last period. Use optional argument "period" to select specific time period.

# Run geomerge using point.agg = 'cnt
output = geomerge(ACLED.fatalities,AidData.commitment,
                  target=states,time=c("2011-01-01","2011-12-31","1"),
                  t_unit='months',point.agg='sum')
##  geomerge: Geospatial data integration.
##  Karsten Donnay and Andrew Linke, 2017
## 
##  ATTENTION: Depending on the resolution and number of datasets, the merger may take some time!
## 
## 
##  geomerge(ACLED.fatalities, AidData.commitment, target = states, 
##     time = c("2011-01-01", "2011-12-31", "1"), point.agg = "sum", 
##     t_unit = "months")
## 
##  Running geomerge in dynamic mode.
##  Dataset1: ACLED.fatalities (SpatialPointsDataFrame)
##  Aggregating point data for period 1... Done.
##  Aggregating point data for period 2... Done.
##  Aggregating point data for period 3... Done.
##  Aggregating point data for period 4... Done.
##  Aggregating point data for period 5... Done.
##  Aggregating point data for period 6... Done.
##  Aggregating point data for period 7... Done.
##  Aggregating point data for period 8... Done.
##  Aggregating point data for period 9... Done.
##  Aggregating point data for period 10... Done.
##  Aggregating point data for period 11... Done.
##  Aggregating point data for period 12... Done. 
##  Dataset ACLED.fatalities successfully merged to target.
##  Dataset2: AidData.commitment (SpatialPointsDataFrame)
##  Aggregating point data for period 1... Done.
##  Aggregating point data for period 2... Done.
##  Aggregating point data for period 3... Done.
##  Aggregating point data for period 4... Done.
##  Aggregating point data for period 5... Done.
##  Aggregating point data for period 6... Done.
##  Aggregating point data for period 7... Done.
##  Aggregating point data for period 8... Done.
##  Aggregating point data for period 9... Done.
##  Aggregating point data for period 10... Done.
##  Aggregating point data for period 11... Done.
##  Aggregating point data for period 12... Done. 
##  Dataset AidData.commitment successfully merged to target.
##  Completed!
summary(output)
## geomerge completed: 2 datasets successfully integrated - run in dynamic mode, spatial panel was generated.
## 
## The following 2 numerical variable(s) are available:
##  ACLED.fatalities, AidData.commitment
## 
## First and second order temporal lag values available.
plot(output)
## Output data is spatial panel, showing results only for the last period. Use optional argument "period" to select specific time period.

Note: By default, plot selects the last time period for purposes of the visualization. If the user wishes to visualize any other period, simply add the optional argument period to the function. Also, first- and second-order time-lagged variables are returned by default. The optional Boolean argument time.lag controls this feature.

output = geomerge(ACLED.fatalities,AidData.commitment,
                  target=states,time=c("2011-01-01","2011-12-31","1"),
                  t_unit='months',point.agg='sum',time.lag=FALSE)
##  geomerge: Geospatial data integration.
##  Karsten Donnay and Andrew Linke, 2017
## 
##  ATTENTION: Depending on the resolution and number of datasets, the merger may take some time!
## 
## 
##  geomerge(ACLED.fatalities, AidData.commitment, target = states, 
##     time = c("2011-01-01", "2011-12-31", "1"), time.lag = FALSE, 
##     point.agg = "sum", t_unit = "months")
## 
##  Running geomerge in dynamic mode.
##  Dataset1: ACLED.fatalities (SpatialPointsDataFrame)
##  Aggregating point data for period 1... Done.
##  Aggregating point data for period 2... Done.
##  Aggregating point data for period 3... Done.
##  Aggregating point data for period 4... Done.
##  Aggregating point data for period 5... Done.
##  Aggregating point data for period 6... Done.
##  Aggregating point data for period 7... Done.
##  Aggregating point data for period 8... Done.
##  Aggregating point data for period 9... Done.
##  Aggregating point data for period 10... Done.
##  Aggregating point data for period 11... Done.
##  Aggregating point data for period 12... Done. 
##  Dataset ACLED.fatalities successfully merged to target.
##  Dataset2: AidData.commitment (SpatialPointsDataFrame)
##  Aggregating point data for period 1... Done.
##  Aggregating point data for period 2... Done.
##  Aggregating point data for period 3... Done.
##  Aggregating point data for period 4... Done.
##  Aggregating point data for period 5... Done.
##  Aggregating point data for period 6... Done.
##  Aggregating point data for period 7... Done.
##  Aggregating point data for period 8... Done.
##  Aggregating point data for period 9... Done.
##  Aggregating point data for period 10... Done.
##  Aggregating point data for period 11... Done.
##  Aggregating point data for period 12... Done. 
##  Dataset AidData.commitment successfully merged to target.
##  Completed!
summary(output)
## geomerge completed: 2 datasets successfully integrated - run in dynamic mode, spatial panel was generated.
## 
## The following 2 numerical variable(s) are available:
##  ACLED.fatalities, AidData.commitment
plot(output, period=3)
## Output data is spatial panel, showing variables only for period 3, as specified.

Generating Grid Target

Thus far, we have only considered integration targets in the form of the Nigeria county polygons states. The generateGrid function in geomerge allows the user to easily generate a matching grid of a chosen resolution. For many econometric applications, this option can be very useful.

# install.packages("sp")
require(sp)

# Generate grid with 10 km cell size (input in m) in local CRS for Nigeria
states.grid <- generateGrid(states,
                              size= 10000, # meters
                              local.CRS=CRS("+init=epsg:26391"),
                              silent = TRUE)

# Run simple static integration with this grid as target
output = geomerge(ACLED.fatalities,target=states.grid,point.agg='sum')
##  geomerge: Geospatial data integration.
##  Karsten Donnay and Andrew Linke, 2017
## 
##  ATTENTION: Depending on the resolution and number of datasets, the merger may take some time!
## 
## 
##  geomerge(ACLED.fatalities, target = states.grid, point.agg = "sum")
## 
##  Running geomerge in static mode.
##  Dataset1: ACLED.fatalities (SpatialPointsDataFrame)
##  Aggregating point data... Done. 
##  Dataset ACLED.fatalities successfully merged to target.
##  Completed!
summary(output)
## geomerge completed: 1 datasets successfully integrated - run in static mode.
## 
## The following 1 numerical variable(s) are available:
##  ACLED.fatalities
plot(output)